home *** CD-ROM | disk | FTP | other *** search
- [Chris Pirazzi provides the following useful information (thanks,
- Chris!) -Doug]
-
- From cpirazzi@cp.esd.sgi.com Mon Jul 24 22:57:19 1995
- Received: from cp.esd.sgi.com by candiru.engr.sgi.com via ESMTP (940816.SGI.8.6.9/911001.SGI)
- for <cook@candiru.engr.sgi.com> id WAA26935; Mon, 24 Jul 1995 22:57:19 -0700
- Received: by cp.esd.sgi.com (940816.SGI.8.6.9/940406.SGI.AUTO)
- for cook id WAA05094; Mon, 24 Jul 1995 22:57:18 -0700
- Date: Mon, 24 Jul 1995 22:57:18 -0700
- From: cpirazzi@cp.esd.sgi.com (Chris Pirazzi)
- Message-Id: <199507250557.WAA05094@cp.esd.sgi.com>
- To: cook@cp.esd.sgi.com
- Subject: floating point underflow exceptions
- Status: OR
-
-
- you might want to include some info from here on
- your signal processing faq.
-
- this has saved me several times.
-
- this is the final version of my faq item
-
- ============================================================================
-
-
- well, it has taken me several months, but I have finally tracked down
- all the bugs and needed details concerning floating point underflow
- exception handling.
-
- this information should help people to identify a condition that could
- speed up their signal processing code by hundreds of times.
-
- ------------------------------
-
- FAQ item for comp.sys.sgi.misc and comp.sys.sgi.audio FAQ's
-
- (I guess it should probably be in .misc, and there should be a pointer
- question in .audio. It is definitely NOT just an audio thing, but
- audio people really want to know this.)
-
- ------------------------------
-
- Subject: -XX- why does my floating point signal processing routine,
- when given certain inputs, run incredibly slowly and
- consume all of the CPU in _system_ or _interrupt_ time ?
- Date: Mon Jul 24 22:28:13 PDT 1995
-
- You may be experiencing an undesirable "floating point underflow"
- behavior of the floating point unit on R3k's and beyond. Roughly, a
- floating point underflow (defined in IEEE standard 754) occurs when a
- floating point operation creates a non-zero number whose absolute
- value is so small that it would cause other exceptions in subsequent
- operations. When an underflow occurs that is not somehow masked, the
- FPU causes an interrupt (R3k) or trap (R4k and later) on the CPU, the
- CPU runs some code to handle the trap, and then the floating point
- instruction in your program completes. This happens once for each
- underflowing instruction.
-
- Recursive filters will often generate large numbers of underflows in
- large spans, and so they often clearly reveal the slow processing of
- these exceptions. Code can easily run hundreds or thousands of times
- more slowly if it underflows on every operation.
-
- - How Do I Tell if I'm Getting Lots of Floating Point Underflows?
-
- On R3k machines, programs that are performing lots of operations which
- underflow will eat up the CPU in "intr" time (the yellow part of the
- CPU bar on gr_osview).
-
- On R4k and later machines, such programs will eat up the CPU in
- "system" time (the red part of the CPU bar on gr_osview), since trap
- handling counts towards "system" time.
-
- You can check for these underflows (and other exceptions) more
- reliably by temporarily linking your program with -lfpe, setting the
- environment variable TRAP_FPE to 'DEBUG;ALL=COUNT(1)', and running
- your program. This will print something every time any floating point
- exception occurs. You probably want to change the 1 to some large
- number so that you can see just how many FPE's occur without waiting
- for thousands of printfs.
-
- - How Do I Fix It?
-
- There are two methods:
-
- 1. Link your program with -lfpe, and execute the following code
- snippet in your program once, before your signal processing code:
-
- #include <sigfpe.h>
-
- /*
- set underflowing values to zero (_ZERO), but in particular,
- also set the special "flush zero" bit (FS, bit 24) in the
- Control Status register. This bit exists in R4k and later
- processors. This special bit will cause the FPU not to
- generate an exception for floating point underflows, and
- quietly substitute zero instead. On R3k CPUs, this
- setting will be treated just like "_ZERO."
- */
- sigfpe_[_UNDERFL].repls = _FLUSH_ZERO;
- handle_sigfpes(_ON, _EN_UNDERFL, NULL,
- _ABORT_ON_ERROR, NULL);
-
-
- 2. (works on R4000 and later processors ONLY) execute the following
- code snippet in your program once, before your signal processing code
- (linking with libfpe is neither required nor recommended for method
- 2):
-
- #include <sys/fpu.h>
-
- /*
- set the special "flush zero" but (FS, bit 24) in the
- Control Status Register of the FPU of R4k and beyond
- so that the result of any underflowing operation will
- be clamped to zero, and no exception of any kind will
- be generated on the CPU. This has no effect on
- an R3000.
- */
- void flush_all_underflows_to_zero()
- {
- union fpc_csr f;
- f.fc_word = get_fpc_csr();
- f.fc_struct.flush = 1;
- set_fpc_csr(f.fc_word);
- }
-
- Method 2 is highly recommended and preferred for development of any
- code that does not need to execute on R3000 CPUs. See below for why.
- Note that any code compiled -mips2 or higher already has this
- restriction built-in, and so should use method 2.
-
- In general, it makes sense for all signal processing code to include
- one of these code snippets for better performance, since it doesn't
- hurt, and since underflows often come up unexpectedly.
-
- - What Is Going On?
-
- Note that on R4k's and above, method 1 ends up performing exactly the
- same Control Status Register operation as method 2, plus a lot of
- other unnecessary stuff. Method 1 requires the application to link
- with libfpe. This library was designed for use in trapping and
- debugging floating point exceptions, not silencing them. For example,
- any app that links with libfpe must deal with several subtle and nasty
- side effects relating to signal handling. This is why we strongly
- recommend the use of method 2 whenever possible.
-
- But, as we mentioned, method 1 is the only solution available for R3k
- machines. On R3k's, the code snippet above causes the interrupt
- handler for the underflow exception to clamp the underflowing value to
- zero. This code does NOT prevent the FPU from issuing future
- underflow interrupts (these interrupts cannot be disabled on the R3k),
- but it does severely decrease the likelihood that you will run into
- serious performance degradations due to underflow. This is because
- the underflows in a typical recursive filter come in large spans of
- several thousand underflows that occur before the accumulated value
- finally reaches zero. This libfpe setting "catches" and clamps such
- underflow spans at the moment that they begin. Note that we have used
- the constant _FLUSH_ZERO instead of _ZERO so that this snippet also
- solves the underflow problems on R4ks and beyond. On R3ks,
- _FLUSH_ZERO and _ZERO are equivalent.
-
- On R4k and later FPUs, method 1 and method 2 both set a bit on the FPU
- which prevents the FPU from issuing any trap on underflow. The FPU
- quietly substitutes zero for the result of underflowing operations.
- Therefore, this setting is even more effective than it is on R3k's.
- It may still be less efficient than an algorithm which never
- underflows, though. On processors later than R4k (R8k in fast mode,
- for example), this behavior may be the default, so you may never see
- the problem.
-
- KNOWN BUGS:
-
- In IRIX 5.3, two regressions occurred that did not exist in earlier
- IRIX releases.
-
- R4600 CPUs: the libfpe library (and thus method 1) does not work at
- all on R4600s. Any attempt to set libfpe options will result in a
- message about "unknown CPU type." A patch is in the works for this.
- Contact your SGI service representative about possible patches for
- internal bug number 275803 or for libfpe. You can also use method 2
- to get around this bug.
-
- R3000 CPUs: the libfpe library (and thus method 1) does not work at
- all on R3000s. Any attempt to set libfpe options will result in a
- message about "cause bits" and an abort (core dump). A patch is in
- the works for this rather serious regression. Contact your SGI
- service representative about possible patches for internal bug number
- 276012. This bug is a kernel bug and requires a kernel patch.
- Programs that attempt to intercept SIGFPE directly (ie, not via
- libfpe) are also affected by this bug.
-
- NEW INFO: as of this writing, the fix for this R3000 bug made it into
- the R3000 kernel patch number 676, which has not yet been released.
- By the time you read this, there may be another higher-numbered R3000
- kernel patch that includes the fixes of patch 676 and other fixes too.
- Contact your SGI service representative to be sure.
-
-